Cache-oblivious Algorithms Cache-oblivious Algorithms Acknowledgments

نویسندگان

  • Harald Prokop
  • Charles E. Leiserson
  • Arthur C. Smith
چکیده

This thesis presents “cache-oblivious” algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary algorithms for matrix transposition, matrix multiplication, sorting, and Jacobi-style multipass filtering are not cache optimal. We present algorithms for rectangular matrix transposition, FFT, sorting, and multipass filters, which are asymptotically optimal on computers with multiple levels of caches. For a cache with size Z and cache-line length L, where Z = Ω(L2), the number of cache misses for an m n matrix transpose is Θ(1 + mn=L). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n=L)(1 + logZn)). The cache complexity of computing n time steps of a Jacobi-style multipass filter on an array of size n is Θ(1 + n=L + n2=ZL). We also give an Θ(mnp)-work algorithm to multiply an m n matrix by an n p matrix that incurs Θ(m+ n+ p+ (mn+ np+mp)=L+mnp=LpZ) cache misses. We introduce an “ideal-cache” model to analyze our algorithms, and we prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels. We further prove that any optimal cache-oblivious algorithm is also optimal in the previously studied HMM and SUMHmodels. Algorithms developed for these earlier models are perforce cache-aware: their behavior varies as a function of hardware-dependent parameters which must be tuned to attain optimality. Our cache-oblivious algorithms achieve the same asymptotic optimality on all these models, but without any tuning. Thesis Supervisor: Charles E. Leiserson Title: Professor of Computer Science and Engineering

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation

An experimental comparison of cache aware and cache oblivious static search tree algorithms is presented. Both cache aware and cache oblivious algorithms outperform classic binary search on large data sets because of their better utilization of cache memory. Cache aware algorithms with implicit pointers perform best overall, but cache oblivious algorithms do almost as well and do not have to be...

متن کامل

Cache-oblivious wavefront algorithms for dynamic programming problems: efficient scheduling with optimal cache performance and high parallelism

Wavefront algorithms are algorithms on grids where execution proceeds in a wavefront manner from the start to the end of the execution (execution moves through the grid as if a wavefront is moving). Many dynamic programming problems and stencil computations are wavefront algorithms. Iterative wavefront algorithms for evaluating dynamic programming (DP) recurrences exploit optimal parallelism, b...

متن کامل

Funnel Heap - A Cache Oblivious Priority Queue

The cache oblivious model of computation is a two-level memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multi-level memory model. Arge et al. recently presented the first optimal cache oblivious priority queue, and demonstr...

متن کامل

Cache-aware and Cache-oblivious Algorithms

---------------------------------------------------------------------------------------------iii Table of

متن کامل

Cache Efficient Simple Dynamic Programming

New cache-oblivious and cache-aware algorithms for simple dynamic programming based on Valiant’s context-free language recognition algorithm are designed, implemented, analyzed, and empirically evaluated with timing studies and cache simulations. The studies show that for large inputs the cache-oblivious and cache-aware dynamic programming algorithms are significantly faster than the standard d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999